Overview

Dataset statistics

Number of variables14
Number of observations233538
Missing cells36127
Missing cells (%)1.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory24.9 MiB
Average record size in memory112.0 B

Variable types

NUM9
CAT3
DATE1
BOOL1

Reproduction

Analysis started2020-05-12 02:45:54.808702
Analysis finished2020-05-12 02:46:24.110994
Duration29.3 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

VERSIE has constant value "1.0" Constant
DATUM_BESTAND has constant value "2020-04-24" Constant
PEILDATUM has constant value "2020-04-01" Constant
TYPERENDE_DIAGNOSE_CD has a high cardinality: 1766 distinct values High cardinality
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with AANTAL_SUBTRAJECT_PER_SPCHigh correlation
GEMIDDELDE_VERKOOPPRIJS has 36127 (15.5%) missing values Missing
AANTAL_SUBTRAJECT_PER_ZPD is highly skewed (γ1 = 20.87549947) Skewed

Variables

VERSIE
Boolean

CONSTANT
REJECTED

Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.8 MiB
1
233538
ValueCountFrequency (%) 
1233538100.0%
 

DATUM_BESTAND
Categorical

CONSTANT
REJECTED

Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.8 MiB
2020-04-24
233538
ValueCountFrequency (%) 
2020-04-24233538100.0%
 

Length

Max length10
Median length10
Mean length10
Min length10

PEILDATUM
Categorical

CONSTANT
REJECTED

Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.8 MiB
2020-04-01
233538
ValueCountFrequency (%) 
2020-04-01233538100.0%
 

Length

Max length10
Median length10
Mean length10
Min length10

JAAR
Date

Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.8 MiB
Minimum2012-01-01 00:00:00
Maximum2020-01-01 00:00:00
Histogram

BEHANDELEND_SPECIALISME_CD
Real number (ℝ≥0)

Distinct count27
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean422.5759405321618
Minimum301
Maximum8418
Zeros0
Zeros (%)0.0%
Memory size1.8 MiB

Quantile statistics

Minimum301
5-th percentile302
Q1305
median313
Q3322
95-th percentile361
Maximum8418
Range8117
Interquartile range (IQR)17

Descriptive statistics

Standard deviation924.108952
Coefficient of variation (CV)2.186847057
Kurtosis70.71843912
Mean422.5759405
Median Absolute Deviation (MAD)8
Skewness8.520739096
Sum98687540
Variance853977.3551
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3053311414.2%
 
3133018312.9%
 
3032682711.5%
 
330187868.0%
 
316159906.8%
 
308118165.1%
 
32497464.2%
 
30696494.1%
 
30194074.0%
 
30475653.2%
 
Other values (17)6045525.9%
 
ValueCountFrequency (%) 
30194074.0%
 
30250172.1%
 
3032682711.5%
 
30475653.2%
 
3053311414.2%
 
ValueCountFrequency (%) 
841830721.3%
 
19001530.1%
 
3905750.2%
 
38925481.1%
 
36237971.6%
 

TYPERENDE_DIAGNOSE_CD
Categorical

HIGH CARDINALITY

Distinct count1766
Unique (%)0.8%
Missing0
Missing (%)0.0%
Memory size1.8 MiB
101
 
980
402
 
954
301
 
926
403
 
926
203
 
880
Other values (1761)
228872
ValueCountFrequency (%) 
1019800.4%
 
4029540.4%
 
3019260.4%
 
4039260.4%
 
2038800.4%
 
2018760.4%
 
4017860.3%
 
4047750.3%
 
8027670.3%
 
4097530.3%
 
Other values (1756)22491596.3%
 

Length

Max length4
Median length3
Mean length3.350152866
Min length2

ZORGPRODUCT_CD
Real number (ℝ≥0)

Distinct count5880
Unique (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean441766792.3267948
Minimum10501002
Maximum998418081
Zeros0
Zeros (%)0.0%
Memory size1.8 MiB

Quantile statistics

Minimum10501002
5-th percentile28999036
Q199799050
median149599030
Q3990004006
95-th percentile990416029
Maximum998418081
Range987917079
Interquartile range (IQR)890204956

Descriptive statistics

Standard deviation429373913.8
Coefficient of variation (CV)0.9719470119
Kurtosis-1.74209471
Mean441766792.3
Median Absolute Deviation (MAD)119700025
Skewness0.4626516201
Sum1.031693331e+14
Variance1.843619578e+17
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
99000400916870.7%
 
99000400716640.7%
 
99000300416630.7%
 
99000400613450.6%
 
99035607611650.5%
 
99035607310830.5%
 
99000300710660.5%
 
13199922810090.4%
 
13199916410000.4%
 
1992990139640.4%
 
Other values (5870)22089294.6%
 
ValueCountFrequency (%) 
105010026< 0.1%
 
105010038< 0.1%
 
105010048< 0.1%
 
105010058< 0.1%
 
105010073< 0.1%
 
ValueCountFrequency (%) 
998418081112< 0.1%
 
99841808097< 0.1%
 
99841807926< 0.1%
 
9984180775< 0.1%
 
9984180765< 0.1%
 

AANTAL_PAT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count8475
Unique (%)3.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean504.21537394342675
Minimum1
Maximum152465
Zeros0
Zeros (%)0.0%
Memory size1.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median14
Q3102
95-th percentile1706
Maximum152465
Range152464
Interquartile range (IQR)99

Descriptive statistics

Standard deviation3091.237089
Coefficient of variation (CV)6.130787059
Kurtosis374.465047
Mean504.2153739
Median Absolute Deviation (MAD)13
Skewness16.20594033
Sum117753450
Variance9555746.743
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
13866116.6%
 
2190008.1%
 
3123035.3%
 
490733.9%
 
570653.0%
 
659272.5%
 
749272.1%
 
841681.8%
 
938751.7%
 
1033761.4%
 
Other values (8465)12516353.6%
 
ValueCountFrequency (%) 
13866116.6%
 
2190008.1%
 
3123035.3%
 
490733.9%
 
570653.0%
 
ValueCountFrequency (%) 
1524651< 0.1%
 
1471271< 0.1%
 
1444911< 0.1%
 
1089861< 0.1%
 
1089421< 0.1%
 

AANTAL_SUBTRAJECT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct count9029
Unique (%)3.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean585.8709460558881
Minimum1
Maximum239632
Zeros0
Zeros (%)0.0%
Memory size1.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median14
Q3111
95-th percentile1928
Maximum239632
Range239631
Interquartile range (IQR)108

Descriptive statistics

Standard deviation3882.47443
Coefficient of variation (CV)6.626842406
Kurtosis704.8303162
Mean585.8709461
Median Absolute Deviation (MAD)13
Skewness20.87549947
Sum136823129
Variance15073607.7
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
13729316.0%
 
2186538.0%
 
3121895.2%
 
489333.8%
 
569983.0%
 
658972.5%
 
749432.1%
 
841451.8%
 
938011.6%
 
1034011.5%
 
Other values (9019)12728554.5%
 
ValueCountFrequency (%) 
13729316.0%
 
2186538.0%
 
3121895.2%
 
489333.8%
 
569983.0%
 
ValueCountFrequency (%) 
2396321< 0.1%
 
2319311< 0.1%
 
2296791< 0.1%
 
2265671< 0.1%
 
2184331< 0.1%
 

AANTAL_PAT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count7328
Unique (%)3.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7645.161339910422
Minimum1
Maximum208424
Zeros0
Zeros (%)0.0%
Memory size1.8 MiB

Quantile statistics

Minimum1
5-th percentile42
Q1416
median1725
Q36467
95-th percentile36535
Maximum208424
Range208423
Interquartile range (IQR)6051

Descriptive statistics

Standard deviation17521.54892
Coefficient of variation (CV)2.291848155
Kurtosis31.50762579
Mean7645.16134
Median Absolute Deviation (MAD)1562
Skewness4.914266766
Sum1785435689
Variance307004676.4
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
13770.2%
 
193710.2%
 
373540.2%
 
203540.2%
 
113400.1%
 
93340.1%
 
123300.1%
 
23270.1%
 
323240.1%
 
213220.1%
 
Other values (7318)23010598.5%
 
ValueCountFrequency (%) 
13770.2%
 
23270.1%
 
32690.1%
 
42960.1%
 
52760.1%
 
ValueCountFrequency (%) 
20842419< 0.1%
 
20321325< 0.1%
 
20245217< 0.1%
 
20016316< 0.1%
 
19851017< 0.1%
 

AANTAL_SUBTRAJECT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count8099
Unique (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10662.092327586945
Minimum1
Maximum336711
Zeros0
Zeros (%)0.0%
Memory size1.8 MiB

Quantile statistics

Minimum1
5-th percentile52
Q1541
median2350
Q38820
95-th percentile50809
Maximum336711
Range336710
Interquartile range (IQR)8279

Descriptive statistics

Standard deviation25178.36648
Coefficient of variation (CV)2.361484567
Kurtosis35.71828716
Mean10662.09233
Median Absolute Deviation (MAD)2152
Skewness5.189359763
Sum2490003718
Variance633950138.8
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
13320.1%
 
132790.1%
 
382780.1%
 
342710.1%
 
22640.1%
 
932630.1%
 
112590.1%
 
222590.1%
 
462590.1%
 
202570.1%
 
Other values (8089)23081798.8%
 
ValueCountFrequency (%) 
13320.1%
 
22640.1%
 
32570.1%
 
42350.1%
 
52180.1%
 
ValueCountFrequency (%) 
33671119< 0.1%
 
32656525< 0.1%
 
32315320< 0.1%
 
29373517< 0.1%
 
28985917< 0.1%
 

AANTAL_PAT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count234
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean669125.3314878093
Minimum4
Maximum1489527
Zeros0
Zeros (%)0.0%
Memory size1.8 MiB

Quantile statistics

Minimum4
5-th percentile45746
Q1286060
median744725
Q3995553
95-th percentile1334838
Maximum1489527
Range1489523
Interquartile range (IQR)709493

Descriptive statistics

Standard deviation410521.7659
Coefficient of variation (CV)0.6135199889
Kurtosis-1.024536983
Mean669125.3315
Median Absolute Deviation (MAD)301942
Skewness0.04354448669
Sum1.562661917e+11
Variance1.685281203e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
88099151022.2%
 
87325243541.9%
 
84372343481.9%
 
88662843241.9%
 
82811942041.8%
 
107784938871.7%
 
106403938511.6%
 
102864138281.6%
 
104034738101.6%
 
98082437571.6%
 
Other values (224)19207382.2%
 
ValueCountFrequency (%) 
43< 0.1%
 
78< 0.1%
 
104< 0.1%
 
2028< 0.1%
 
2435< 0.1%
 
ValueCountFrequency (%) 
148952729761.3%
 
145061930541.3%
 
142189135641.5%
 
133483835401.5%
 
133132235471.5%
 

AANTAL_SUBTRAJECT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count236
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1048057.1340252978
Minimum4
Maximum2538563
Zeros0
Zeros (%)0.0%
Memory size1.8 MiB

Quantile statistics

Minimum4
5-th percentile49064
Q1458322
median1065599
Q31719589
95-th percentile2185552
Maximum2538563
Range2538559
Interquartile range (IQR)1261267

Descriptive statistics

Standard deviation698522.2421
Coefficient of variation (CV)0.6664925217
Kurtosis-0.8986604022
Mean1048057.134
Median Absolute Deviation (MAD)628825
Skewness0.2923463218
Sum2.44761167e+11
Variance4.879333227e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
121156751022.2%
 
127974943541.9%
 
121580443481.9%
 
130062943241.9%
 
120839842041.8%
 
253856338871.7%
 
248949538511.6%
 
243552638281.6%
 
206748638101.6%
 
218555237571.6%
 
Other values (226)19207382.2%
 
ValueCountFrequency (%) 
43< 0.1%
 
88< 0.1%
 
104< 0.1%
 
2114< 0.1%
 
2411< 0.1%
 
ValueCountFrequency (%) 
253856338871.7%
 
248949538511.6%
 
243552638281.6%
 
218555237571.6%
 
206748638101.6%
 

GEMIDDELDE_VERKOOPPRIJS
Real number (ℝ≥0)

MISSING

Distinct count3032
Unique (%)1.5%
Missing36127
Missing (%)15.5%
Infinite0
Infinite (%)0.0%
Mean3496.6998292901612
Minimum70.0
Maximum287220.0
Zeros0
Zeros (%)0.0%
Memory size1.8 MiB

Quantile statistics

Minimum70
5-th percentile140
Q1460
median1235
Q34015
95-th percentile13215
Maximum287220
Range287150
Interquartile range (IQR)3555

Descriptive statistics

Standard deviation6643.232732
Coefficient of variation (CV)1.899857882
Kurtosis180.997742
Mean3496.699829
Median Absolute Deviation (MAD)1000
Skewness8.182229188
Sum690287010
Variance44132541.13
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
16018510.8%
 
10517040.7%
 
11014160.6%
 
18013310.6%
 
30012140.5%
 
14011820.5%
 
14510230.4%
 
2959970.4%
 
5009920.4%
 
1859700.4%
 
Other values (3022)18473179.1%
 
(Missing)3612715.5%
 
ValueCountFrequency (%) 
702260.1%
 
7574< 0.1%
 
803590.2%
 
858280.4%
 
904410.2%
 
ValueCountFrequency (%) 
2872208< 0.1%
 
1489103< 0.1%
 
1428804< 0.1%
 
1221554< 0.1%
 
1167653< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
01.02020-04-242020-04-012012-01-01301751797990331979201210309211925812967501856605255.0
11.02020-04-242020-04-012012-01-013017517979900355103092119258129675018566052165.0
21.02020-04-242020-04-012012-01-01301751797990131110309211925812967501856605NaN
31.02020-04-242020-04-012012-01-01301751797990419910309211925812967501856605NaN
41.02020-04-242020-04-012012-01-0130175179799007212210309211925812967501856605575.0
51.02020-04-242020-04-012012-01-01301751797990451110309211925812967501856605NaN
61.02020-04-242020-04-012012-01-013017517979904691010309211925812967501856605NaN
71.02020-04-242020-04-012012-01-0130175179799025192110309211925812967501856605570.0
81.02020-04-242020-04-012012-01-013017517979903786293979551030921192581296750185660575.0
91.02020-04-242020-04-012012-01-0130175179799012313110309211925812967501856605680.0

Last rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
2335281.02020-04-242020-04-012018-01-01327011299002719889414739421728184576336251220.0
2335291.02020-04-242020-04-012018-01-01327061399002718584853516490818457633625114230.0
2335301.02020-04-242020-04-012018-01-0132706139900271811681703516490818457633625118095.0
2335311.02020-04-242020-04-012018-01-013270613990027131474835164908184576336251165.0
2335321.02020-04-242020-04-012018-01-01327061399002719954457135164908184576336251850.0
2335331.02020-04-242020-04-012018-01-013270613990027180373735164908184576336251NaN
2335341.02020-04-242020-04-012018-01-0132706139900271981314157435164908184576336251220.0
2335351.02020-04-242020-04-012018-01-0132706139900271794435164908184576336251NaN
2335361.02020-04-242020-04-012018-01-01327061399002718621022344351649081845763362513350.0
2335371.02020-04-242020-04-012018-01-013270613990027182737535164908184576336251NaN